-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add CTE support to select in QueryBuilder #6621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@nio-dtp, thanks for the PR. As this is a new feature, please retarget against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a very basic unit test that would demonstrate the expected SQL and an integration test that would run the resulting query on the platforms that support CTE.
I'm curious to see how this will work with query parameters (both positional and named).
2936e39
to
686bf02
Compare
79cf2fb
to
1e545fc
Compare
@morozov Thanks for the review. I've added some tests and a I'm not sure if it should check the platform supports in the QueryBuilder or if it is enough to mention in the documentation that it is not supported for deprecated mysql 5.7 |
We try to avoid those If we did that, what would be the worst this that could happen? MySQL 5.7 users will get a syntax error if and only if they try to run a query with a CTE. I could live with that. If we really want a nicer error message for MySQL 5.7 users, we could move the Either way: Please remove the |
Thanks @nio-dtp for providing this PR. ❤️ Love to see that other has the same needs and staying on the DBAL way to implement this. Just let me add my five cents to this. First, I would remove the platform support cte methods as @derrabus already mentioned and asked for. In the end, we would end up with multiple support methods if we want to take care beforehand of support questions, because even a lot of vendors supports CTE in general, they have dirty little differences. I'm here more for let the database report if it does not understand something and that it needs to be done from the application then. DBAL should here only provide a generic way to let WITH (CTE) syntax in general created and used. The second point I want to put a light on is the current implementation of the Regarding the WITH
baseCTE AS (SELECT id, title FROM tableName),
secondCTE AS (SELECT id, title FROM baseCTE where id < 1000)
SELECT * from secondCTE If public function with(string $name, string|\Stringable|self $part): self
{
$this->with = [];
$this->with[] = [ /* adding first part */ ];
}
public function addWith(string $name, string|\Stringable|self $part, string ...$dependsOn): self
{
$this->with[] = [
/* adding additional part */
];
} The current implemetation would already allow to add a WITH RECURSIVE
recursiveCTE AS (
-- initial query
SELECT id, parentId, title FROM table_a WHERE parentId = 0
UNION
-- recursive query
SELECT b.id, b.parentId, b.title
FROM table_b AS b
WHERE b.parentId > 0 AND b.parentId = recursiveCTE.id
)
SELECT * FROM recursiveCTE With recursive CTE the whole thing gets a little more nifty, as different vendors requires or allow lazyness on different levels. For example:
WITH RECURSIVE
recursiveCTE(virtual_id, virtual_title, virtual_parentid) AS (
-- initial query
SELECT
id AS virtual_id,
title AS virtual_title,
parentId AS virtual_parent_id
FROM table_a WHERE parentId = 0
UNION
-- recursive query
SELECT
b.id AS virtual_id,
b.title AS virtual_title,
b.parentId AS virtual_parentid
FROM table_b AS b
WHERE b.parentId > 0 AND b.parentId = recursiveCTE.virtual_id
)
SELECT * FROM recursiveCTE That is not fully possible with the current implementation. So we could consider
In any case, the From a internal handling perspective, I suggest to introduce a internal class for cte parts similar to the internal Join part class along with the options required (name, field, depends, isRecursive etc) which would allow to omit a dedicated flag and later iterate the array and add the RECURSIVE keywoard as soon as at least one part has it set. Or a additional container around the parts could be implemented, which internal tracks that when a recursive part gets added. I would not make that class public API. With such a To be honest, I did not investigated yet which DBAL supported platforms allows RECURSIVE cte's on top of normal CTE's and which not and this was still outstanding. I started a custom implementation a couple of months ago [1], but delayed it due to time constraints and implemented it as a custom solution within the TYPO3 decorated QueryBuilder (prefixed) as a test-baloon [2] as we needed it for the release after implementing a first (quite advanced) usage [3] To summerize this, this PR is a good start but in my eyes not finished yet (without wanting to blame it), because in general it is the same I started with before adding the additional stuff. To be honest, in my working state I had also the support methods but already considered to drop them again before making a pull-request out of it. I propose that we clarify first what Doctrine DBAL wants to support and what not out of all these things and than where to continue. Either (if @nio-dtp) is open to adopt it and continue or if I should polish and finish my work (which is alrady some steps further) and adopt to the decisions made (dependency sorting or not, internal class usage or not, ...) Suggested method(s): /**
* @param non-empty-string[] $fields
*/
public function with(
string $identifier,
string|\Stringable|QueryBuilder $part,
array $fields = [],
WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {}
/**
* @param non-empty-string[] $dependsOn
* @param non-empty-string[] $fields
*/
public function addWith(
string $identifier,
string|\Stringable|QueryBuilder $part,
array $dependsOn = [],
array $fields = [],
WithType $type = WithType::SIMPLE, // WithType::RECURSIVE for recursive part
): self {} Developers implementing a recursive CTE needs to use a dedicated QueryBuilder instance to build a union query using the union support to define it, if not done the database should report this as an issue and not tried to scan or throw a custom exception from doctrine. Recursive CTE's have been the reason why I started and contributed the union support for the QueryBuilder as a preparation for providing a CTE implementation. In my POC/WIP i extendted the I just would not merge this one here to quickly before considering the aforementioned points, at least for a overall strategy and either making it directly or with followups.
|
Thanks you to all the reviewers and their detailed feedback, I'll rework my PR and make a more complete proposal.
|
I'd improve the following aspects:
As for the "reset" method – what is the use case for it? It would basically allow to disregard the logic of a pre-built query. If it wasn't necessary, why declare it in the first place? |
I talked with @derrabus on sunday about this and came up with following using 4 methods: /**
* @param string[] $fields
*/
public function with(
string $name,
string|\Stringable|self $part,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function addWith(
string $name,
string|\Stringable|self $part,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function withRecursive(
string $name,
string|\Stringable|self $initialOrUnionPart,
string|\Stringable|self|null $recursivePart,
array $fields = [],
): self {}
/**
* @param string[] $fields
*/
public function addRecursive(
string $name,
string|\Stringable|self $initialOrUnionpart,
string|\Stringable|self|null $recursivePart,
array $fields = [],
): self {} If a UNION query is passed as the first requried part for the recursive vairants, it is simply used. If both parts are passed, internally a union query should be created (using the union querybuilder api without allowing duplicates). Should be explained within the method phpdocblocks and by providing a concrete single part the developer has full power about the recursive union block. to follow semantic and logic similar to with or withRecursive will reset the internal array and create a first element, whereas the addWith* methods adds an additionall entry to the internal array (DTO object). The We think that a dedicated It should be possible to define the fields for the CTE part, but having that optional: WITH
customCte(virtual_id, virtual_field)
AS (SELECT id AS virtual_id, somefield AS virtual_field FROM sometable)
SELECT * from customCte Regarding my point for the `depends, in special for the recursive CTE's we discussed and decided to not provide an API in this low-level implementation. Developers or frameworks (for example ORM or similar) should keep track on there own and add the parts in the required and correect order. No custom sorting or cylcing detection, will be reported by the database when executed. We had no hard meaning for the internal implementation and building the query, so the current switch form may be okay. Personally, I think it would make more sense to move that into the DefaultSelectQueryBuilder and pass the with array (of DTO's) within the with/addWith and withRecursive/addWithRecursive naming would follow the semantic Doctrine already has for the other parts, exceopt that the *Recursive variant makes it more clear. Sure, PHPDoc block needs to make that clear. In the end, it kind of follows the design approach of the QueryBuilder. And it is noticable that the top (most outer) querybuilder instance (in most cases that instance where the with/addWith/withRecursive/addWithRecursive methods are used) needs to be used for creating named placeholders (parameters). That matches the same requriement and flow as it is needed for using QueryBuilder to create sub queries or for the union support already and can therefore be taken as expectable. (Remark to #6621 (comment)) Taking this, we could make this one a first implementation for the My question here is, if @nio-dtp wants to update and work on this and has the time for it the next time. Otherwise I would rebase and finish my work in my fork and provide an additional PR in the next two weeks (which was a start after an original pitch and discussion with @derrabus but not added as PR(draft pr) yet due to time constraints). I'm totally fine not to do anything and test/verify/review this later on but also being fine finishing mine (because of access) and mention @nio-dtp as co-author then. |
If it is ok for all, I'll make a proposal by the end of the week without the recursive part. |
A small remark on this: I would remove the |
Thanks @nio-dtp, and that is pretty fine. I will add the recursive part afterwards in a follow up PR. |
48c0cb7
to
82a9c20
Compare
I'm a bit confused why we're merging parameters from different QBs now. Didn't agree that we don't do that? |
So, yes, please remove the whole parameter merging stuff. It bloats and complicates this PR. Just as in subqueries and unions, parameters are managed on the outermost QB. |
I think there was a misunderstanding, that's why I've added this part |
In my opinion, the behavior of ignoring the parameters bound to subqueries is wrong, and #6525 is the evidence of that. The API allows binding parameters and ignores them, so the real behavior can be discovered only via testing. This is bad. If there was no precedent with the UNION queries, I wouldn't allow an API like this to be introduced. On the other hand, this PR won't break anything, it will introduce just another poor abstraction, so I'm not going to push back. |
* | ||
* @throws QueryException Setting an empty array as columns is not allowed. | ||
*/ | ||
public function with(string $name, string|QueryBuilder $part, ?array $columns = null): self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if the outer builder will ignore the parameters bound to the inner one, it shouldn't accept QueryBuilder
and should only accept string
? This way, the caller will have to call toSql()
on the inner builder and thereby acknowledge that its parameters won't be used.
This is my first contribution to this repository and I admit that I was missing some knowledge about this project and his conception, especially #6525 It would be great to decide first if the method |
That's okay. You wouldn't have gone through all the discussions back and forth if we had more consistent views between maintainers but both, Alexander and I have quite limited time that we can dedicate to the project, so this is a side effect of that. Please bear with us and sorry for such a rough experience.
This sounds like an option but still, we'll have to develop a plan forward. Let me outline the options that I'd be comfortable with. Option 1. The most puristicThe UNION design isn't the best, so before reusing it for other features, we can fix it. Make it respect the parameters bound to the inner builders (i.e., address #6525) and then add the support for CTEs with the same behavior. Option 2. The quickest oneAccept only strings for CTE expressions as proposed in #6621 (comment). This way, we will have the behavior consistent with the UNION queries but less confusing. Whether or not we make the same changes in the UNION is out of the scope. I'm fine with either and am open to other proposals. Depending on your @nio-dtp goals (whether you just want to contribute the support for CTEs or make other contributions to DBAL), please choose whatever works for you.
Not sure what you mean by "considered". Could you elaborate? |
I was trying to tell that calling
Between the 2 options, I'd prefer the first. But maybe we could think about a third option. An object could manage the query parameters for all the query builders instanced by a single connection instance. This commit is an example of such implementation, using a singleton. Tests are passing locally but I'm sure I've missed some points, as it is out the box. Tell me if you think It would be a good thing to work more further this solution. Or if we keep going with the first solution. |
The third option looks like over-complication. The connection should not be involved in the process of building a query, it's the job of the query builder. The fact that some queries are built from subqueries is an implementation detail that shouldn't leak outside of the builder.
We definitely could but the one proposed above doesn't look better than the others. |
Now I understand what you're saying but I'm not sure I agree. In my understanding, the state of a query builder should encapsulate the query and the values of the bound parameters. Having Let's assume the outer query uses positional parameters and two inner query internally. How is the user of the outer query supposed to address the parameters of the second inner query? Do they need to keep in mind the number of parameters in the outer query that precede the inner ones and the number of parameters in the first inner query? This looks unmanageable. |
I agree with that and I didn't found other way to address an object that manage all the parameters.
Assuming this, I'll address the Option 1, and first #6525
Ok you're right, I didn't imagine this use case |
@morozov I don't think that the back-and-forth on this PR is especially helpful. I understand that merging parameters might be desirable or feel more intuitive, but we have not done this yet. Not for subqueries, not for unions. This PR is about adding CTEs to the QB which is something we can deliver without merging parameters. Let's use #6525 to discuss if and how we want to merge parameters. This can be done in a follow-up, independently from the CTE feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this PR looks ready to be merged. I understand that not everybody is happy with not merging parameters of different query builders, but that's something we can discuss for a follow-up.
I'm fine with merging it as is. |
Thanks @nio-dtp ! |
Fixes #5018.
Summary
This pull request introduces support for Common Table Expressions (CTEs) across various database platforms and updates the
QueryBuilder
to utilize this feature.